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We present a sequence-based probabilistic formalism that directly 
addresses co-operative effects in networks of interacting positions 
in proteins, providing significantly improved contact prediction, as 
well as accurate quantitative prediction of free energy changes due 
to non-additive effects of multiple mutations. In addition to these 
practical considerations, the agreement of our sequence-based cal- 
culations with experimental data for both structure and stability 
demonstrates a strong relation between the statistical distribution 
of protein sequences produced by natural evolutionary processes, 
and the thermodynamic stability of the structures to which these 
sequences fold. 



This manuscript was originally written in 2002 and available from http: / /library.lanl.gov/ cgi-bin/getfik 
It's being deposited here for greater ease of access. 

Our approach is analogous to solving an inverse problem of statistical mechanics: 
determine the physical interaction parameters of a twenty-state spin system given a set of 
sequences drawn from the Boltzmann equilibrium distribution. The sequences we consider 
are sets of aligned protein sequences drawn from variable sequence families defined in the 
Pfam database [Ij. We assume that within each family the sequences adopt a common 
(but in principle unknown) fold whose underlying structure is reasonably conserved across 
the family. Each sequence of length L of a given family can be viewed as a different 
global state of an L-site, twenty-state (for twenty amino acids) spin system, with spin- 
spin (i.e. residue-residue) interactions determined by (1) the (unknown) structure of the 
associated fold, and (2) the physico-chemical characteristics of the residues. Solving the 
inverse problem to determine the underlying physical interactions addresses "correlation 
at a distance", in which correlations between locally connected sites in an interacting 
network such as a spin system, or a protein, can propagate throughout the network, 
leading to observed correlations between sites that have no direct physical interaction 
[2]. Such propagated correlations can be even greater than correlations between any 
directly connected sites in the system [3]. Previous computational work on abstract 
models of proteins [I], as well as a statistical analysis of the frequency of ion-pairs in 
crystal structures of real proteins [5] , provided early hints that Boltzmann-like statistics 
are associated with aspects of protein architecture. In view of complicated evolutionary 
pressures acting on naturally evolved protein sequences it is surprising that developing 
a strictly thermodynamic approach can, as we demonstrate below, lead to an accurate 
predictive methodology for both protein structure and stability. 

Other work relating sequence statistics to physical interactions, but restricted to as- 
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suming independent (non-interacting) sites, successfully characterized protein-DNA bind- 
ing interactions given sequence data |6] [7J |8] . "Semi-rational" protein sequence design, see 
e.g. [9j, also assumes independent sites, and analyzes natural sequence variation to suggest 
mutations leading to greater thermodynamic stability. However, analysis of mutations in 
sets of aligned sequences, first for RNA sequences pU] and later for protein sequences 
[XT] , has shown that mutations in pairs of sequence positions are often correlated. Such 
pairwise correlations have been used in attempts to predict spatially proximate residues 
(contacts) in folded proteins [El [13j HH [15l [16j [171 02] • The hypothesis is that pairs of 
variable residue positions, possibly distant along the sequence but spatially proximate in 
the folded molecule, will display significant covariation. Published approaches analyze 
correlations between at most two sequence positions at a time, hence they inherently as- 
sume that each potentially interacting pair of positions under consideration is physically 
isolated from all other positions |19j . This assumption is reasonable for RNA molecules, 
given the saturating hydrogen bond interaction between base-pairs, and accuracy of con- 
tact prediction for RNA using pairwise covariation formulae is relatively high [TO]. This 
assumption is not reasonable for the typically diffuse and networked interactions among 
amino acids, and accuracy of contact prediction for proteins using pairwise covariation 
formulae is relatively poor. Pairwise covariation formulae were recently used for a quali- 
tative description of stability changes upon mutations in the SH3 domain, as well as for 
contact prediction [IS] . Attempts to chain together separate pairwise analyses to approxi- 
mate interaction networks in proteins [2T] can be illuminating, suggesting that a complete 
formalism to address network effects would be fruitful. 

The Boltzmann network method presented here does not treat each individual pair 
of sites of interest as isolated from other residues. Instead, we construct a probability 
distribution describing full length sequences of length L for each protein sequence family. 
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Any given sequence alignment typically contains enough data to estimate only single and 
pairwise amino acid frequencies with reasonable accuracy. One point of departure from 
previous analyses using single and pairwise frequencies is that we adopt an information 
theoretic viewpoint, and ask for the least biased probability distribution, defined over 
all L sites, whose first and second order moments match the single and pairwise amino 
acid frequencies of the given data. "Least biased" is defined to be the maximum entropy 
distribution [20J , which in our context may be intuitively viewed as the flattest distribution 
among the many distributions that have first and second order moments matching the 
amino acid frequencies in the given data [22\ . 

The maximum entropy distribution whose moments match a given set of single and 
pairwise amino acid frequencies may be written in the following form [23], reminiscent of 
thermal Boltzmann statistics 



where E is a sum of single and pairwise interactions among potentially all amino acids 



denotes the residue present at position % in sequence X, it has the value 1 if amino acid a 
is present at sequence position i, and is otherwise. The X's are adjustable parameters (to 
be determined) such that the calculated first and second order moments of this distribution 
match the single and pairwise amino acid frequencies in the given sequence alignment. 
i and j label sequence positions (1 to L), and a and (3 label the twenty possible amino 
acids. Z is a normalization factor. It can be shown [25] that matching the moments of 
the maximum entropy distribution to the given sequence data is equivalent to maximizing 
the loglikelihood of the given sequence data given the parametric form, Eqns. (1,2), for 
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the probability distribution. This formalism is related to Boltzmann Machines [27] and 
Graphical Models [2H], used in other contexts. 




So far, E is merely a suggestive symbol appearing in a probability distribution, Eqns. 
(1,2), describing sequence statistics of an alignment. However, it is shown below if the 
X's are adjusted so that the moments of the distribution match the given amino acid 
frequencies, then E is highly correlated with a real, physical, thermodynamic free energy 
of unfolding. Furthermore, we use the probability distribution over all L sites, Eqns. 
(1,2), to resolve issues of correlation at a distance (network effects) in proteins, resulting 
in significantly improved contact prediction from sequence information. 

We consider aligned sequences for eleven domains [29] taken from the Pfam p]| database, 
with associated x-ray crystal structures taken from the Protein Data Bank [30]. These do- 
mains were chosen to be diverse in sequence (less than 50% pairwise sequence identity) 
and to have more than 200 sequences per family. The distance between a pair of residues 
was defined to be the distance between their carbon (3 atoms, and pairs of residues with 
carbon (3 distance of less than 7 Angstroms were defined to be in contact (carbon a coor- 
dinates were used for glycines). Results reported below are robust to changes in definition 
of contact. 

Prediction of which residues are directly interacting (i.e. in physical contact) uses the 
concept of conditional mutual information [20] applied to P{X) after the X's have been 
determined for each sequence family. In our context, conditional mutual information, 
CMI, measures the degree of covariation between residues at sequence positions i and j 
that is solely due to direct effects of i on j (and vice versa), factoring out contributions to 
the correlation between i and j caused by interaction of both i and j with the rest of the 
network of residues. It is a discrete (and nonlinear) analogue of linear partial correlation 
analysis [31j [32] and is intuitively described by this process: (a) freeze all residues other 
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than those at i and j to a fixed state, thus preventing information propagation through 
the rest of the network, (b) calculate the mutual information between % and j, using P(X), 
above, with the rest of the network frozen, and (c) average this result over all possible 
frozen states of the rest of the network [33]. Pairs of sites with high CMI (over a user 
defined threshold) are predicted to be in contact. 

Two quantities, specificity and sensitivity, are typically used to characterize predictive 
ability. Specificity is defined as the fraction of predicted contacts that are actual contacts 
(as defined by carbon (3 distances) i.e. the overall probability that a predicted contact 
is correct. Sensitivity is defined as the fraction of actual contacts that are correctly pre- 
dicted. High specificity is more desirable than high sensitivity, because in our context 
predicting even a small number of contacts with high accuracy provides extremely valu- 
able constraints on ab initio protein structure calculations [3"H [35]. Hereafter we refer to 
specificity as "accuracy". To survey accuracy as a function of CMI threshold we succes- 
sively lowered the CMI threshold, in effect walking down a list of predicted contact pairs 
ordered by CMI value, for each domain. This process yields accuracy of prediction as a 
function of the number of pairs predicted to be in contact |36j . 

To compare our method to others we also analyzed contact prediction accuracy using 
(a) a pairwise covariation measure [H] (denoted as $iM for $ Association Method |37j . 
which we believe to be the most accurate of published methods), (b) conventional pairwise 
mutual information [19] (denoted as MI) and (c) a baseline reference resulting from ran- 
dom selection of position pairs (denoted as Random). The measure used in (a) above also 
incorporates some correction for phylogenetic artifacts. Fig. (1) shows overlaid curves 
for accuracy of contact prediction via the different methods, versus number of predicted 
contacts, for the SH3 domain. The most accurate method is the Boltzmann network 
method, which uses conditional mutual information to predict contact pairs. Accuracy 
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Figure 1: Accuracy of Contalct'^reali^/^^ er of Predicted Con- 

tacts: Accuracy of prediction (y-axis) vs. number of predicted contact pairs (x-axis) for 
the SH3 domain is shown. Boltzmann, the top curve, is the result of the Boltzmann net- 
work method. MM is the result of what we believe to be the most accurate published 
pairwise covariation method [18] (does not address network interactions, does address 
phylogenetic artifacts), MI is the result of pairwise mutual information (does not address 
network interactions), and Random is the average result of picking at random a specified 
number of contacts. The inset blows up the region from 1 to 50 predicted contacts. The 
accuracy of contact prediction using the Boltzmann network method, which incorporates 
co-operative effects among residues, significantly exceeds that of other methods. 
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varies somewhat from family to family, therefore we show the averaged accuracy over 
eleven domains in Fig. (2) using the same four predictive methods. The Boltzmann 
network method has on average consistently higher accuracy for a greater number of pre- 
dicted contacts. Predicted contacts for the eleven domains using the Boltzmann network 
method are available in the supplemental material |38j . 
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Figure 2: Accuracy of Contac^T^r^^icli^n^as^a Fifty Predicted 

Contacts: An Average over eleven domain families, for the same predictive methods of 
Fig. (1). Boltzmann, the top curve, is the result of the Boltzmann network method pre- 
sented here and has significantly higher average accuracy, demonstrating the importance 
of addressing co-operative effects within proteins. 



The maximum entropy probability distribution, Eqn. (1), has a thermal, Boltzmann 
form with exponent E(X). After the X's have been determined for a given sequence 
alignment, E(X) assigns an "energy" value to any sequence X. Interpreting E(X) as 
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an effective free energy relative to the unfolded state allows a free energy of unfolding 
[39] . AG = —E(X), to be predicted using our formalism. Changes in sequence, X, will 
change E(X) and hence the AG of sequence mutants can be calculated and compared to 
experiment. Experimentally determined melting temperatures (assumed proportional to 
the free energy of unfolding) for wildtype Fyn SH3 sequence, and for a set of single, double 
and triple mutants of the wildtype were reported in |18j . To assess how well co-operative 
non-additive effects are captured by our formalism, we calculated AG values after the 
X's had been determined in two different ways: (1) the interaction parameters, X"f , were 
allowed to adjust during the determination of the probability distribution P(X), (2) the 
interaction parameters, X°f, were held fixed to zero, allowing only additive effects to be 
captured by the remaining adjustable single site Xf parameters [ID] . As will be seen below, 
the correct prediction of the effects of even single site mutants requires consideration of the 
other sites with which it interacts. The eleven residues identified by a structural analysis 
[T8] to be in the hydrophobic core of the SH3 domain were selected for use in assessing AG 
prediction, i.e. the A parameters used for computing AG allowed potential interaction 
among all eleven sites of the hydrophobic core. Significant sequence variation is necessary 
input information for our method, and so within this set of eleven core positions we report 
AG values for mutations involving the three positions (26, 39 and 50 in the numbering 
scheme of [18]) that displayed the highest mutual information. 

Experimentally determined melting temperatures were reported [18] for four single, 
four double, and three triple mutants, in addition to the wildtype for these three positions. 
In Fig. (3) the difference of the mutant and wildtype AG's as computed by our method for 
these mutant domains is shown to be highly correlated (absolute value of correlation 0.91) 
with the experimentally measured melting temperatures. If non-additive and co-operative 
effects are disallowed by holding the interaction terms to zero then the correlation is poor 
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(absolute value of correlation 0.02) and the signs of the predicted AG are incorrect, Fig. 
(4). 

The high correlation of predicted and measured AG shown in Fig. (3) suggests per- 
forming a computational survey of all possible (20 3 ) mutations to search for other inter- 
esting single, double and triple mutants. We found no statistically significant prediction 
of a mutant considerably more stable than wildtype, although the single site mutant F26L 
has a predicted melting temperature similar to wildtype with a value of 84.9. Predictions 
of melting temperature for other interesting three site mutants are available in the sup- 
plemental material [38J. Results of an extensive computational search among all (20 11 ) 
possible sequences defining a total redesign of the eleven site hydrophobic core of the SH3 
domain are also presented [38J. 

The success of the Boltzmann network formalism in predicting free energy changes 
upon mutation clearly demonstrates a deep relationship between the statistics of sequences 
selected by natural evolutionary processes and the thermal stability of the structures to 
which these sequences fold. However, such a strong relationship would not necessarily 
be expected given that protein sequences produced by evolution are strongly affected by 
functional constraints in addition to stability constraints |41] . A possible explanation 
of the statistics-stability relation is that functional properties are typically confined to 
localized regions of a protein, e.g. binding sites, and that optimization of small local 
regions for functional fitness occurs after global selection for sequences that stably fold. 
An independent, computational investigation of the extent to which sequences are shaped 
by natural selection for stability was published recently [12] although contact prediction 
and prediction of free energy changes was not explicitly addressed. In contrast to our 
sequence based approach, this work used structural information, combined with an all- 
atom free energy function incorporating a variety of physical effects to computationally 
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Figure 3: Measured Melting Temperature versus Predicted Free Energy of Un- 
folding: Network Interactions Allowed The free energy of unfolding, AG, computed 
using the Boltzmann network method versus experimentally measured melting tempera- 
ture for eleven mutants of the SH3 domain (four singles, four doubles, three triples) as 
well as the the wildtype. Co-operative and non-additive effects were allowed, resulting 
in a good correlation of computation with experiment (absolute value of the correlation 
is 0.91). The single site mutant, I50F, as discussed by Larson, involves mutating to 
a residue more frequent in the alignment and yet is measured to be quite destabilizing 
with a measured melting temperature of 45.3. Only if network interactions are allowed 
is this single site mutant correctly predicted as quite destabilizing. The triple mutant 
F26I/A39G/I50F, with a measured melting temperature of 73.7, involves I50F with 
additional compensatory second site mutations. It is correctly predicted as just mildly 
destabilizing compared to wildtype only if network interactions are allowed. Comparison 
of this figure (network interactions included) to Fig. (4) (network interactions excluded), 
shows in general the importance of network interactions. 
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Figure 4: Measured Melting Temperature versus Predicted Free Energy of 
Unfolding: Network Interactions Not Allowed The free energy of unfolding, AG, 
versus experimentally measured melting temperature for the same eleven mutants of the 
SH3 domain and for wildtype, as Fig. (3), but when co-operative and non-additive effects 
are disallowed by holding the interaction parameters, Xff , to zero. There is a dramatic 
decrease in correlation of computation with experiment (absolute value of the correlation 
is now 0.01), and even the signs of the stability changes are incorrect when network 
interactions are disallowed. 
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design sequences for a variety of domains. The native, naturally occurring sequence for 
each structure considered was found to be close to optimal for each structure, and for SH3, 
the pairwise correlations between sites in a set of computationally designed sequences 
recapitulated the correlations observed in a set of native SH3 core sequences. The extent 
to which the Boltzmann network's energy function, E(X), involving empirical parameters 
determined solely from sequence information for each domain family, can be identified 
with the physical/structural effects defining the energy function of this structure-based 
complementary study remains an interesting issue. 

Limiting factors in application of the Boltzmann network algorithm include (1) the 
amount of naturally evolved sequence data currently available per family (size of the 
sequence alignment), and (2) the phylogenetic relatedness (and associated selection arti- 
facts) of these sequences. Modifications to the algorithm presented here, e.g. (1) consider- 
ation of statistical significance of the fitted A parameters, and (2) addressing phylogenetic 
relationships of sequences in an alignment, have the potential to further increase accuracy 
using naturally evolved sequence sets. 

However, the ability to create in the laboratory totally novel sequences for protein 
domains via artificial evolution techniques such as phage display [13] [H], promises new, 
rich, and diverse sequence sets with well characterized in vitro selection pressures. Such 
sequence data, when available in greater quantity, and analyzed with the methods herein, 
offer a new paradigm for sequence based structure prediction, and for the computational 
design of sequences with preferred properties. 
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Appendix: Supporting Online Material 

1 Determining the A Parameters by Maximum Like- 
lihood Analysis 

The maximum entropy probability distribution subject to constraints on the first and 
second order moments, has an exponential form 

exp - gpQj 

where E is a sum of single and pairwise interactions, 

E(X)= j2Kfx?x? +ew 

af3ij ai 

and 

Z = J2exp-[E(X)] 

x 

is a sum over all possible (20 L ) sequences of length L which normalizes the distribution [1]. 
The X's are Lagrange multipliers implementing the constraints that the first and second 
order moments of the distribution match the single and pairwise amino acid frequencies in 
a given sequence alignment. Each sequence X of the alignment may therefore be assigned 
a probability, P{X), which is a function of the X's. 

For each sequence alignment considered one may write the joint probability of all S 
sequences of the alignment as a function of the X's (assuming that the sequences are 
independent) as 

x S -Ff exp - \E(X(s))] 
P {Sequences) = JJ — — V 

5 = 1 z 

where s references each sequence of the alignment. Although naturally evolved sequences 
that are related by a phylogenetic tree are not independent, making the assumption 
of independence, for simplicity, still yields results of high accuracy (this assumption of 
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sequence independence is of course unrelated to issues of site independence within a 
sequence). Properly addressing the phylogenetic relatedness of sequences is complicated, 
but has the potential to increase accuracy still further. 
Taking logs of both sides yields 

loglP(Sequences)} = -[ £ X*X? (s)X? (s) + £ A?X?(a) + S * log(Z)] 



S * [£ A^Xf + £ A? * Xf + ^(Z)] 



Here, Xf and XfX? represent, respectively, the single and pairwise amino acid frequencies 
obtained by simple counting in the given aligned sequence data set. 

A steepest ascent step, maximizing log[P (Sequences)}, changes the A parameters by 
an amount proportional to the gradient of log[P (Sequences)} with respect to the X's 

dlog[P (Sequences)} 



where 



p ^ - *t y_ n _ [xfX^- < X?X' >) 
i j QX 13/ 

^^ dlog[P(Sequences)} ^ { -_ <x:>) 



<x?xf>= dl09{Z) 



< X? >-- 



dxf 

dlog(Z) 



dXf 

represent the second and first order moments of the distribution, respectively. In princi- 
ple, evaluating these moments involves (20 L ) summations, however since they are simple 
averages they may be efficiently estimated in practice via Monte Carlo [2]. Once the 
moments have been estimated at a current setting of the X's, the X's are changed by an 
amount proportional to AA and the process is iterated to convergence. This procedure is 
essentially the "training" algorithm for a Boltzmann machine [3] when there are no hidden 
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units. Note that at the maximum of log[P (Sequences)}, i.e., when AA°f = 0, AAf = 0, 
the single and pairwise moments of the distribution match the single and pairwise amino 
acid frequencies of the given sequence alignment. Furthermore, it is possible to show by 



consideration of mixed second derivatives, such as 




log[P (Sequence)} is a convex function and that there are no local maxima. 

For the results reported in the main text, the X's were initialized with the X°f inter- 
action terms set to zero, and the Xf terms chosen to match the single site amino acid 
frequencies of each given sequence alignment. To evaluate moments of the distribution 
given some current values of the X's, 400, 000 sequence configurations were obtained by 
generating a Monte Carlo chain of 4, 000, 000 steps, and keeping every tenth configuration 
of this chain when estimating the <> moments. Change in the X's, AX, are zero, and 
the iterative process converges, when the moments exactly match the amino acid frequen- 
cies. This occurs when the likelihood is a maximum and the gradient is zero. Effective 
convergence was reached in (very roughly) on the order of 10, 000 — 15, 000 changes of 
AX's or 40 — 60 hours of computer time (depending on domain size), on a dedicated single 
processor 1 ghz cpu with 500 megbytes of memory. No significant effort was made to 
optimize code beyond addressing the most obvious inefficiencies. 

2 Predicted Contacts for Eleven Families 

The top 50 predicted contact pairs, using the Boltzmann network method (see main text), 
for each of the 11 Pfam families follows. Each column, representing one protein family, is 
ordered by descending value of conditional mutual information. The numbering scheme 
for specifying position pairs of each predicted contact uses the residue number appearing 
on the "ATOM" lines in the PDB files listed at the top of each column. 
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3 A Computational Survey of (20 3 ) Mutants of the 
Hydrophobic Core of Fyn SH3 



The high correlation reported in the main text for calculated AG values with measured 
melting temperatures, for mutations in positions 26, 39 and 50 (numbering scheme of 
reference [4]), suggests performing a computational survey of all (20 3 ) possible mutants in 
these positions. Using the regression line of Fig. (3) of the main text enables conversion 
of any calculated AG to a predicted melting temperature. We surveyed all (20 3 ) possible 
mutants in these three positions and selected those mutant sequences with predicted 
melting temperatures within the range of measured melting temperatures of Fig. (3), in 
effect interpolating new sequences between existing sequences with measured melting 
temperatures. Regarding sequences outside of this range: sequences with significantly 
higher melting temperatures were not found; on the other end of the temperature range, 
sequences utilizing amino acids that were rare in the initial sequence alignment depend on 
A parameters that are poorly determined, and were eliminated from the set of significant 
predictions. 50 such triple mutants, ordered by predicted melting temperature, are listed 
below. The numbering scheme is that of reference [4]: residues listed correspond to 
positions 4,6,10,18,20,26,28,37,39,50,55. 
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4 A Computational Survey of (20 ) Hydrophobic Core 
Sequences of Fyn SH3 

A computational survey of sequence space can also be performed using the Boltzmann 
network formalism, even when the number of potential sequences in the survey precludes 
exhaustive enumeration. We illustrate this by suggesting complete redesigns for the eleven 
residue hydrophobic core sequence of SH3, for which an exhaustive survey of (20 11 ) pos- 
sible core sequences is infeasible. A stochastic search via simulated annealing, using 
the modified Lam schedule for temperature changes [5,6], was used to compile a list 
of the 50 most stable sequences identified during the annealing process. Of these pre- 
dicted core sequences, 26 occur in the initial sequence alignment, i.e. occur in naturally 
evolved proteins, and constitute predictions of the melting temperatures of these nat- 
ural sequences. The remaining 24 sequences constitute predictions of new stable core 
sequences. Residues listed below correspond to positions 4,6,10,18,20,26,28,37,39,50,55 in 
the numbering scheme of reference [4]. 
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